Navigating Data Structures using Search Algorithms

Part 1 - Introduction

In this tutorial series, we will construct below search algorithms step by step, intuitively and programmatically.

  1. Breadth First Search (BFS)
  2. Depth First Search (DFS)
  3. Uniform Cost Search (UCS)
  4. A Star (A*)

Prerequisite: Basic Python

Problem Definition:

We will learn by solving a problem. Below is a famous Arad to Bucharest path finding problem used in multiple AI courses across the globe. The lines connecting cities are assumed to be straight routes, cities considered as points, and cost mentioned is just the straight line distance (SLD) between the city geo spatial points (latitude, longitude). This is to keep away all complexities to understand the foundations better. Note, our initial problem is adapted heavily for our tutorial purposes.

The problem statement: find a route from Arad to Bucharest

Since we will learn the algorithm by programming, we shall create a dictionary explaining only the graph just with geo spatial position (approximately values from internet), and connecting cities. I have created another snippet (abstracted here), which would create a diagram for us from this dictionary to help us visualize as we go forward. Also it calculates and displays the actual "cost" between cities, by calculated the straight line distance between two geo spatial points by geodesic method.

In [1]:
from docHelpers_ipython import Doc, romania_location_map_full
from IPython.display import display, Image

romania_location_map_open_full = {
    'Arad' : {'pos': (21.31227,46.18656), 'connections': ['Sibiu','Timisoara','Zerind'] },
    'Sibiu' : {'pos': (24.12558,45.79833), 'connections': ['Fagaras','Rimnicu Vilcea'] },
    'Zerind' : {'pos': (21.51742,46.62251), 'connections': ['Oradea'] },
    'Timisoara' : {'pos': (21.20868,45.74887), 'connections': ['Lugoj'] },
    'Oradea' : {'pos': (21.91894,47.04650), 'connections': [] },
    'Fagaras' : {'pos': (24.97310,45.84164), 'connections': ['Bucharest'] },
    'Lugoj' : {'pos': (21.90346,45.69099), 'connections': ['Mehadia'] },
    'Rimnicu Vilcea' : {'pos': (24.36932,45.09968), 'connections': ['Craiova','Pitesti'] },
    'Mehadia' : {'pos': (22.36452,44.90411), 'connections': ['Dobreta'] },
    'Dobreta' : {'pos': (22.65973,44.63692), 'connections': [] },
    'Craiova' : {'pos': (23.79488,44.33018), 'connections': [] },
    'Pitesti' : {'pos': (24.86918,44.85648), 'connections': [] },
    'Bucharest' : {'pos': (26.10254,44.42677), 'connections': ['Giurgiu','Urziceni'] },
    'Giurgiu' : {'pos': (25.96993,43.90371), 'connections': [] },
    'Urziceni' : {'pos': (26.64112,44.71653), 'connections': ['Hirsova','Vaslui'] },
    'Vaslui' : {'pos': (27.72765,46.64069), 'connections': ['Lasi'] },
    'Lasi' : {'pos':(27.60144,47.15845), 'connections': ['Neamt'] },
    'Neamt' : {'pos': (26.38188,46.97587), 'connections': [] },
    'Hirsova' : {'pos': (27.94566,44.68935), 'connections': ['Eforie'] },
    'Eforie' : {'pos': (28.65273,44.04911), 'connections': [] }
}

doc = Doc(romania_location_map_open_full)
mappy = doc.showMap()
display(Image(mappy))

Note that all routes in above map are open ended. This setup is purpose fully done. This is now a tree problem. There are no closed loops. A tree is a type of Graph. A Graph might contain closed loops which we will see shortly.

Below is a sample illustration of how the SLD between cities are calculated. There could be any strategy. One could include fuel cost, curved routes cost, etc.

In [2]:
import geopy.distance
from math import floor
def calculate_GD(start, end):

    (x0,y0) = start
    (x1,y1) = end
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)

M = romania_location_map_open_full
cost = calculate_GD(M['Arad']['pos'],M['Sibiu']['pos'])  #calculating between Arad and Sibiu
cost
Out[2]:
138

For comparision, below is the original map, we have currently adapted from. Note ours is open ended for now, and costs are also different (Peter, author of AIMA book which is origin of this problem, never explained how he arrived at those values, so in my adaptation, I used own methodology)

AIMA Romania Problem

Ok, now we have established the problem, let us get in to our first algorithm next. For easy handling and understanding, we will use letters for cities instead of whole words as below.

In [3]:
romania_location_map_open = {
    'A' : {'pos': (21.31227,46.18656), 'connections': ['S','T','Z'] },
    'S' : {'pos': (24.12558,45.79833), 'connections': ['F','RV'] },
    'Z' : {'pos': (21.51742,46.62251), 'connections': ['O'] },
    'T' : {'pos': (21.20868,45.74887), 'connections': ['LU'] },
    'O' : {'pos': (21.91894,47.04650), 'connections': [] },
    'F' : {'pos': (24.97310,45.84164), 'connections': ['B'] },
    'LU' : {'pos': (21.90346,45.69099), 'connections': ['M'] },
    'RV' : {'pos': (24.36932,45.09968), 'connections': ['C','P'] },
    'M' : {'pos': (22.36452,44.90411), 'connections': ['D'] },
    'D' : {'pos': (22.65973,44.63692), 'connections': [] },
    'C' : {'pos': (23.79488,44.33018), 'connections': [] },
    'P' : {'pos': (24.86918,44.85648), 'connections': [] },
    'B' : {'pos': (26.10254,44.42677), 'connections': ['G','U'] },
    'G' : {'pos': (25.96993,43.90371), 'connections': [] },
    'U' : {'pos': (26.64112,44.71653), 'connections': ['H','V'] },
    'V' : {'pos': (27.72765,46.64069), 'connections': ['LA'] },
    'LA' : {'pos':(27.60144,47.15845), 'connections': ['N'] },
    'N' : {'pos': (26.38188,46.97587), 'connections': [] },
    'H' : {'pos': (27.94566,44.68935), 'connections': ['E'] },
    'E' : {'pos': (28.65273,44.04911), 'connections': [] }
}

doc = Doc(romania_location_map_open)
mappy = doc.showMap()
display(Image(mappy))

Next, we will develop Breadth First Search algorithm using above setup